AITopics | international symposium

Collaborating Authors

international symposium

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Code Comprehension: A Learnable Representation of Code Semantics

Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler

Neural Information Processing SystemsFeb-12-2026, 08:11:19 GMT

Neural Information Processing Systems http://nips.cc/

international conference, proceedings, representation, (12 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > District of Columbia > Washington (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

c383e44d9a878d1982d9abb838bd5d8a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 19:16:15 GMT

arxiv preprint arxiv, pattern recognition, preprint arxiv, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Analyzing Planner Design Trade-offs for MAPF under Realistic Simulation

Yan, Jingtian, Li, Zhifei, Kang, William, Smith, Stephen F., Li, Jiaoyang

arXiv.org Artificial IntelligenceDec-11-2025

Multi-Agent Path Finding (MAPF) algorithms are increasingly deployed in industrial warehouses and automated manufacturing facilities, where robots must operate reliably under real-world physical constraints. However, existing MAPF evaluation frameworks typically rely on simplified robot models, leaving a substantial gap between algorithmic benchmarks and practical performance. Recent frameworks such as SMART, incorporate kinodynamic modeling and offer the MAPF community a platform for large-scale, realistic evaluation. Building on this capability, this work investigates how key planner design choices influence performance under realistic execution settings. We systematically study three fundamental factors: (1) the relationship between solution optimality and execution performance, (2) the sensitivity of system performance to inaccuracies in kinodynamic modeling, and (3) the interaction between model accuracy and plan optimality. Empirically, we examine these factors to understand how these design choices affect performance in realistic scenarios. We highlight open challenges and research directions to steer the community toward practical, real-world deployment.

artificial intelligence, mapf model, robot, (14 more...)

arXiv.org Artificial Intelligence

2512.09736

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Zhou, Zhongchun, Lai, Chengtao, Gu, Yuhang, Zhang, Wei

arXiv.org Artificial IntelligenceDec-9-2025

Abstract--The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and their asynchronous management, we investigate the opposite point of the design spectrum: a multi-core AI accelerator equipped with a shared system-level cache and application-aware management policies, which keeps the programming effort modest. Our approach exploits dataflow information available in the software stack to guide cache replacement (including dead-block prediction), in concert with bypass decisions and mechanisms that alleviate cache thrashing. We assess the proposal using a cycle-accurate simulator and observe substantial performance gains (up to 1.80x speedup) compared with conventional cache architectures. In addition, we build and validate an analytical model that takes into account the actual overlapping behaviors to extend the measurement results of our policies to real-world larger-scale workloads. Experiment results show that when functioning together, our bypassing and thrashing mitigation strategies can handle scenarios both with and without inter-core data sharing and achieve remarkable speedups. Finally, we implement the design in RTL and the area of our design is 0.064mm Our findings explore the potential of the shared cache design to assist the development of future AI accelerator systems. ITH the advent of the artificial intelligence (AI) era, the demand for AI-tailored hardware has surged across various environments, from data centers to embedded systems. A preliminary version of this paper appeared in the proceedings of ICS 2024. Z. Zhou and C. Lai contributed equally to this work. Z. Zhou and C. Lai are with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail: zzhouch@connect.ust.hk; Gu is with the School of Electronic Science and Engineering, Southeast University, Nanjing, Jiangsu, China W . Zhang (corresponding author) is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail: eeweiz@ust.hk). Personal use of this material is permitted. These accelerators span a broad spectrum, from power-efficient devices to those designed for high computational throughput [34]. AI accelerators, compared with Graphics Processing Units (GPUs), can be optimized for AI applications and tailored for specific scenarios, such as pre-defined neural network (NN) computation graphs, operator types, certain data precision, and given power budgets. Since they are often used in scenarios where the execution graph is known during compilation, they typically employ software-controlled scratchpad memories (SPMs) as the on-chip storage.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.07312

Country: Asia > China > Hong Kong > Kowloon (0.44)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.87)
Semiconductors & Electronics (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SystolicAttention: Fusing FlashAttention within a Single Systolic Array

Lin, Jiawei, Li, Yuanlong, Chen, Guokai, Bourgeat, Thomas

arXiv.org Artificial IntelligenceDec-9-2025

Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and softmax operations, FlashAttention fails to fully utilize the compute resources of modern systolic-array-based accelerators designed for consecutive and large matrix multiplications. To fully unleash the performance potential of systolic arrays for FlashAttention, we propose FSA, an enhanced systolic array architecture that runs the entire FlashAttention on the array without external vector units. Combined with SystolicAttention, an optimized kernel for FSA that achieves fine-grained and element-wise overlapping of FlashAttention operations, FSA maximizes array utilization while preserving the original floating-point operation order of FlashAttention. We implement FSA in synthesizable RTL and evaluate its performance against state-of-the-art systolic-array-based accelerators. Our results show that FSA achieves 1.77x and 4.83x higher attention FLOPs/s utilization compared to AWS Neuron-v2 and Google TPUv5e, respectively. We synthesize FSA in a 16 nm technology at 1.5 GHz, and results indicate only a 12% area overhead compared to a standard weight-stationary systolic array.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.11331

Country:

Europe (0.94)
North America > United States > California (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting

Yu, Zhongkai, Guan, Yue, Yu, Zihao, Zhou, Chenyang, Hu, Zhengding, Pei, Shuyi, Kang, Yangwook, Ding, Yufei, Tsai, Po-An

arXiv.org Artificial IntelligenceDec-8-2025

Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems. To understand the patterns underlying this data movement, we conduct comprehensive data-movement-centric profiling across four state-of-the-art large-scale MoE models released in 2025 (200B-1000B) using over 24,000 requests spanning diverse workloads. We perform systematic analysis from both temporal and spatial perspectives and distill six key insights to guide the design of diverse future serving systems. With our insights, we then demonstrate how to improve wafer-scale GPUs as a case study, and show that minor architectural modifications leveraging the insights achieve substantial performance gains, delivering 5.3x and 3.1x average speedups on DeepSeek V3 and Qwen3, respectively. Our work presents the first comprehensive data-centric analysis of large-scale MoE models and a concrete design study using the learned lessons, with profiling traces and simulation framework already open-sourced with $>$1k downloads. Our traces and results are publicly available at https://huggingface.co/datasets/core12345/MoE_expert_selection_trace

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.05497

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Semiconductors & Electronics (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Sparse Computations in Deep Learning Inference

Tasou, Ioanna, Mpakos, Panagiotis, Vlachos, Angelos, Adamopoulos, Dionysios, Giannakopoulos, Georgios, Katsikopoulos, Konstantinos, Karaparisis, Ioannis, Lazou, Maria, Loukovitis, Spyridon, Mei, Areti, Poulopoulou, Anastasia, Dimitriou, Angeliki, Filandrianos, Giorgos, Galanopoulos, Dimitrios, Karampinis, Vasileios, Mitsouras, Ilias, Spanos, Nikolaos, Anastasiadis, Petros, Doudalis, Ioannis, Nikas, Konstantinos, Retsinas, George, Tzouveli, Paraskevi, Giannoula, Christina, Koziris, Nectarios, Papadopoulou, Nikela, Stamou, Giorgos, Voulodimos, Athanasios, Goumas, Georgios

arXiv.org Artificial IntelligenceDec-3-2025

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, this work provides the necessary knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this work we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms. Ultimately, this paper aims to serve as a resource for performance engineers seeking to develop and deploy highly efficient sparse deep learning models in productions.

artificial intelligence, machine learning, neural information processing system, (16 more...)

arXiv.org Artificial Intelligence

2512.0255

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Energy (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Multimodal "Puppeteer": Exploring Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality

Zhang, Yuchong, Orthmann, Bastian, Ji, Shichen, Welle, Michael, Van Haastregt, Jonne, Kragic, Danica

arXiv.org Artificial IntelligenceDec-2-2025

The integration of robotics and augmented reality (AR) offers promising opportunities to enhance human-robot interaction (HRI) by making teleoperation more transparent, spatially grounded, and intuitive. We present a head-mounted AR "puppeteer" framework in which users control a physical robot via interacting with its virtual counterpart robot using large language model (LLM)-driven voice commands and hand-gesture interaction on the Meta Quest 3. In a within-subject user study with 42 participants performing an AR-based robotic pick-and-place pattern-matching task, we compare two interaction conditions: gesture-only (GO) and combined voice+gesture (VG). Our results show that GO currently provides more reliable and efficient control for this time-critical task, while VG introduces additional flexibility but also latency and recognition issues that can increase workload. We further explore how prior robotics experience shapes participants' perceptions of each modality. Based on these findings, we distill a set of evidence-based design guidelines for AR puppeteer metaphoric robot teleoperation, implicating multimodality as an adaptive strategy that must balance efficiency, robustness, and user expertise rather than assuming that additional modalities are universally beneficial. Our work contributes empirical insights into how multimodal (voice+gesture) interaction influences task efficiency, usability, and user experience in AR-based HRI.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2506.13189

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CAMformer: Associative Memory is All You Need

Molom-Ochir, Tergel, Morris, Benjamin F., Horton, Mark, Wei, Chiyue, Guo, Cong, Taylor, Brady, Liu, Peter, Wang, Shan X., Fan, Deliang, Li, Hai Helen, Chen, Yiran

arXiv.org Artificial IntelligenceNov-26-2025

Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.1974

Country: North America > United States (0.46)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format

Agwa, Shady, Shen, Yikang, Wang, Shiwei, Prodromakis, Themis

arXiv.org Artificial IntelligenceNov-24-2025

Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.

artificial intelligence, machine learning, opération, (13 more...)

arXiv.org Artificial Intelligence

2511.17265

Country:

Europe (0.68)
North America > United States > Michigan (0.28)

Genre: Research Report (0.85)

Industry:

Semiconductors & Electronics (0.88)
Health & Medicine (0.74)
Information Technology > Robotics & Automation (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (0.74)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback